Thinking through the best ways to present and preserve media assets such as video early in the publication cycle will allow for lead time to implement best practices for preservation, such as procuring and/or licensing media for local hosting or exclusively for preservation, or choosing remote services better suited to web harvesting.
Publications with embedded media content often require that these files are compressed, streamed or otherwise optimized for delivery and access. When possible, acquire and retain copies of the original, high resolution, media files for the purpose of long-term preservation. Formats for preservation should be open, non-proprietary or widely adopted. Higher quality uncompressed versions of media files are preferred for inclusion in an archival package, but note there may be cases where original media was modified for privacy concerns or narrative reasons and not published in full. Services that preserve publications typically want the version of record; the best quality version of the published file should be sent to the publication archive.
Where there is a choice of which file format to use, consider these guidelines:
11. Use non-proprietary, broadly supported and adopted open file formats
65. Remove private data from export packages
Move supporting files such as multimedia, fonts, JavaScript, and CSS, local to the publication or inside the application used for publishing. This helps ensure the vital components of the work can be easily packaged together, reduces ongoing maintenance, and helps ensure exports contain all necessary resources.
If this is impractical in the live environment, other guidelines may be relevant:
15. Develop a strategy to capture any external media content
16. Captions for non-text features add meaningful context
20. Ensure all core intellectual components of a work are reflected in the export package
29. Consider a preservation-specific EPUB in your workflow
51. Host media files local to the website
72. Record a walkthrough of features with important layout or interactivity
Sometimes it is necessary or preferable to reference or embed third-party content that is outside of the control of the publisher but integral to the understanding of the work. For these features, anticipate that their availability may be temporary and make plans to ensure that they are not only preserved, but sustained in some form as part of the publication while they are on the publisher platform. In the case of an embedded YouTube video, for example, some options to support preservation might include: retaining or requesting a copy of the video file; getting permission to copy the content directly from YouTube using a downloader tool in order to bring it into the local publication; or web archiving the video page and linking to the archived copy, e.g. on the Internet Archive. An informative caption can help support future readers if the content is unavailable.
These guidelines may also improve preservability of third party hosted media:
12. Start discussions about multimedia early in the project
14. Avoid externally hosted media
16. Captions for non-text features add meaningful context
20. Ensure all core intellectual components of a work are reflected in the export package
39. Avoid the use of iframes to embed multimedia
42. Facilitate a local web archive workflow for iframe content
Owning My Masters (Mastered): The Rhetorics of Rhymes & Revolutions by A.D. Carson includes an annotated interactive timeline created using the Northwestern University Knight Lab’s TimelineJS. A simplified text representation of this timeline is included in the EPUB on the Fulcrum publishing platform. The interactive version, hosted at University of Virginia and embedded on the author's website using an iframe, is linked as an external resource. The timeline is configured from data stored in a Google Sheet owned by the author. A web archive file (WARC) of the interactive timeline site and a CSV of the Google Sheet are included as hosted resources on Fulcrum and available for download. Since Fulcrum resources are included in the export, the archived web page (WARC file) and the text version are both part of the preserved copy.
Embedded enhanced features, especially those that link to resources outside of the publication or use an unusual format, are at the highest risk of failing in the future. For this reason, a meaningful caption is vital for providing clues to future readers about what they should expect to find in that location in the text, and preferably some means of finding it and accessing it. Ideally, this caption would include a title, source, unique persistent identifier (e.g. DOI, ARK ID, or Handle), and a link to an archived copy if different from the identifier. Though any link could ultimately fail, this information would at least provide clues to where the user might find an archived copy. When creating captions, apply the standards available within the format you are using to support automated parsing. For example, use HTML5 <figure> and <figcaption> elements. Alt attributes are also widely used to supply a description in case a feature cannot be viewed. In this respect, a meaningful caption may also meet standards for digital accessibility. For a fuller treatment of this topic, see the poster presentation Embedding Preservability: IFrames in Complex Scholarly Publications, presented at the IPRES 2023 conference.
Where non-text features are supplied as separate publication resources, this guideline may also be relevant:
24. Create metadata for each publication resource
On the Fulcrum platform, University of Michigan Press requires authors to create alt text for all images and caption descriptions for media files. U-M Press’s Author’s Guide documentation includes a definition of alt text, as well as links to the Describing Visual Resources Toolkit and Sample Textual Descriptions for Illustrative Materials.
Some platforms support assigning each publication resource its own descriptive metadata and landing page making it possible to cite them independently of the text as a whole. In these cases, if the publisher has the capacity to assign unique persistent identifiers such as valid DOIs, ARK IDs, or handles to each publication resource and to provide this as part of the metadata, this can help maintain connections between the components of a publication and sustain citation links. Consider the case where a video is embedded in an EPUB and it has a caption under it that includes a registered DOI. The DOI points to a page dedicated to the published video. If the publisher no longer has that material, a preservation service may have the option to register the location of its preservation copy with a DOI registration agency so that the link would point to a new location. If a resource is local to the publication and is not intended to be cited or described independently, then a meaningful caption provides useful context, but creating persistent identifiers isn’t necessary.
These guidelines also relate to the use of identifiers:
17. Use persistent identifiers to link or cite external resources
24. Create descriptive metadata for each publication resource, include identifiers
31. Assign persistent identifiers to significant versions
The EPUB specification defines a list of core media types that are supported. Using formats outside of this list introduces an additional risk for preservation since EPUB reader tools may not support these formats. Publishers should therefore consider whether using something outside of these types is justifiable given that doing so may result in the loss of that media.
This more general guideline may also be useful to consider:
11. Use non-proprietary, broadly supported and adopted open file formats
Embedding media resources within an EPUB ensures that a future reader will be able to locate these resources and view them in the original context of the work. In order to keep the overall size on an EPUB manageable for access, it may be advantageous to embed lower quality copies of the media and link to higher resolution versions via persistent links such as DOIs.
These guidelines also refer to where media content is hosted:
14. Avoid depending on externally hosted web services in general
29. Consider a preservation-specific version of the EPUB
Where there is a strong justification for using remote resources or non-core media, EPUB supports a fallback option that allows something else that is supported to be displayed in its place. This functionality should be used in these instances.
These guidelines may also be relevant when considering use of non-core media types:
29. Consider a preservation-specific version of the EPUB
41. Harvesting the content of iframes may have unpredictable outcomes
If externally linked web content must be visually embedded in an EPUB, recognize that it is at very high risk for loss. If the content cannot be moved inside the EPUB container using supported features, this material should have an informative caption and be described clearly in the structural metadata within the EPUB. Specifically, the package’s manifest metadata should have an item that: (a) specifies the resource URL (b) lists “remote-resources” as a property, and (c) defines a fallback item. If the embedded web content is not supplied to the preservation service, but can be successfully harvested, this additional metadata could facilitate a preservation workflow to identify and capture these features using an appropriate harvesting tool. If, for example, a visually embedded Google Trends chart no longer displays active content in the future, an archived web page with this chart could be accessed instead. This content should be noted consistently and documented as part of the publication that needs to be preserved. In general, any consistency that makes it easy to automatically identify the visually embedded web-based features within the text increases the chance of designing a scalable workflow to manage it.
These guidelines may also be relevant to embedding web content in an EPUB:
16. Captions for non-text features add meaningful context
40. Indicate the license status of resource in the HTML around the object
41. Use HTML iframes with caution
42. Facilitate a local web archive workflow for iframe content
Iframe, short for “inline frame,” is an HTML tag that can be used to embed the content from any URL inside an HTML-based document such as an EPUB or webpage. Some publishers may use an iframe to embed things like YouTube videos, or advanced media players into an EPUB. It is more sustainable to use html <video> or <audio> elements when embedding audio or video. EPUB 3 readers are not required to support iframes. If used, the content may not render in all EPUB 3 readers and is at a high risk of loss through link rot.
These guidelines are also be relevant to embedding media in EPUBs:
12. Start discussions around multimedia early in the process
14. Avoid external dependencies in general
34. Opt for core media types when embedding multimedia in an EPUB
Some preservation services will not collect web content outside of the agreed upon domain names unless copyright for the content being harvested is clear. If third-party pages and features that are visually embedded in an EPUB or a web-based publication are meant to be preserved, it should be possible to identify which content publishers have the right to collect them so that a web crawler can be configured to include or exclude them. One way to communicate these rights is to express them in the metadata that is supplied to the preservation service. Another option is to apply structured metadata describing the rights status to the HTML. The Creative Commons REL documentation includes examples of this that cover both page- and object-level licenses. This approach could support automated harvesting decisions at either level. Alternatively, a publisher could supply a list of domain names to include for harvest during the initial preservation workflow configuration.
These guidelines may also be useful to consider when embedding external web content:
25. Add license information to resource-level metadata
38. List the URLs for external web content in the metadata
45. Embed metadata that includes a license in the <head> of a web page
70. Consider systematically tagging component that should be excluded for preservation
An HTML iframe can contain a wide range of types of content, from a wide range of sources, which makes them a challenge for preservation. The quality of automated website archiving in general can vary greatly. For an iframe embedded in an EPUB or website, the more inconsistent, complex, and dynamic their content, the more likely they will be lost in an automated process. If these features are important to preserve, consider a manual process to capture and package the intellectual components of the iframe content in another form. For example, a video or screenshot with a caption that links to the website might be a sufficient fallback for conveying the contents of the iframe.
These guidelines may also be relevant to use of iframes:
38. List the URLs for each embedded iframe in the metadata
39. Avoid use of iframes in EPUBs
42. Facilitate a local web archiving workflow to support iframes
Preservation services might not support a workflow that automatically harvests the content of iframes embedded within an EPUB. Even if such harvesting is supported, the quality could vary greatly, and the content might change following publication. If fallback options are not sufficient a more stable approach would be for the publisher to create an archived copy of the web page featured in the iframe. While there are tools such as Webrecorder’s ArchiveWeb.page that can be run locally by the publisher to perform single page archiving, there are also third party archiving services such as archive.today or Internet Archive’s Save Page Now service that allow you to archive a single page and generate a persistent link for the embedded web content. This link could be included in a descriptive caption under the embedded feature. Publishers should test the outcome of these single page captures as quality can vary depending on the complexity of the website and the harvest method applied.
These guidelines may also be relevant:
14. Avoid dependence on externally hosted platforms for core features
15. Plan a strategy for preservation when third party dependencies exist
39. Avoid the use of iframes to embed multimedia
Linking to media that is hosted on YouTube or Vimeo is a threat to platform and content longevity, especially for media that is owned or managed by third parties. In order to mitigate against future link rot and the general instability of archiving streamed content, where appropriate (technically and legally), host a local copy of any media assets and embed it in the web page using standard HTML5 media tags. In order to keep the overall size of embedded media manageable for access and for the purpose of web archiving, it may be advantageous to embed lower quality copies of the media and link to higher resolution versions via persistent links such as DOIs.
See also:
12. Start discussions about multimedia features early
14. Avoid depending on externally hosted web services
Created as part of the Brown University Digital Publications program and published by University of Virginia Press, Furnace and Fugue by Tara Nummedal includes a variety of audio and video features. Instead of using a third party service to stream the audio and video, these files are stored local to the website in a subfolder for assets associated with the project. They are embedded using HTML <video> and <audio> tags. An example of this can be seen on the Essays “Interplay” page, where the site designers opted to use a <video> element rather than utilize a service such as Vimeo and YouTube. Hosting these videos local to the site and embedding them using simple HTML tags reduces the chance the audio or video will be lost over time and improves the preservability.
In order to improve the likelihood that content published to the web will be able to be captured via web archiving methods, developers could preload any content that would otherwise depend on user interactions. For example, rather than repeatedly making small API calls as the user interacts with a feature, if the dataset that supports the feature is small enough, load the data as a JSON file when the page loads so that further server calls are not necessary.
This guidelines describes another approach:
50. Consider a “progressive enhancement” design to support a scriptless environment
A Mid-Republican House from Gabii on the Fulcrum platform features an interactive 3D visualization of an archaeological dig. The feature allows a user to navigate the visualization and use it to jump to relevant points within the EPUB text. Due to customizations in the Fulcrum web interface to support this feature, the exported version of the EPUB does not include this interactivity. Web archiving is likely the best approach to capture and archive the experience in this case, but many interactive visualizations repeatedly call a live server as the user interacts with it in order to retrieve data related to the new view. These kinds of features are typically not well supported by web crawlers, which cannot easily determine all of the permutations of user interactions within the visualization and then request each possible view of the data from the server to add to the archived copy. In the case of the Gabii project, however, the visualization was successfully archived using a web crawling approach. The Fulcrum team had selected the WebGL technology for the visualization. This preloads all of the visualization data into the browser at the time the page is loaded. When a user interacts with the visualization, the site does not need to retrieve new data from the server to present a new view. This meant the feature could be archived using a web crawler tool in a form that preserved the interactive functionality.
Avoid using the “embed” option to insert a social media post into your publication. This can be unstable for preservation and for long-term sustainability since posts or accounts may be deleted. If the social media post is integral to the work, consider first taking a screenshot that can be embedded into the publication as an image. Underneath, a caption should indicate the origin of the post. In addition, use a web archive service such as archive.today or Internet Archive’s Save Page Now service to create a copy of the post—be sure to test the results, since archiving social media posts can be unreliable. The two links (live and archived) could be referenced as a citation or footnote depending on local practices.
These guidelines are also relevant to embedding social media posts in a publication:
8. Ensure terms of service cover preservation of data in third-party services
14. Avoid depending on third party services for core intellectual components
55. Consider ethical implications of embedding social media posts
Dynamic maps such as those generated with Google Maps, consist of many smaller map tiles that are loaded on the fly as users pan and zoom. Web crawlers cannot easily capture this experience, nor can it be exported. If the map is not the focal point of the work and is being used to present a small number of locations, consider using one or more still images. Display the place name and coordinates for the pin in the caption and provide a link to a live map.
These guidelines offer alternative ways to manage dynamic map features:
16. Captions add important context to non-text features
53. Consider web page designs that pre-load all data when the page loads
Some web-based features require communication with a server that is driven by an unpredictable user interaction or utilizes an open-ended number of URLs to retrieve the data to support that feature. These features cannot be exported easily due to their dependence on a live website and cannot be captured well using web archiving, which depends on identifying every unique URL. Examples include: dynamic maps (e.g. Google Maps), full text or faceted search, web forms, data visualizations (e.g. ArcGIS), IIIF image viewers, and streamed content. Some features can be redesigned to remove their dependency on a live server, but if they can’t, publishers will need to consider what can be preserved. There are many strategies for this: for example, create a simpler static version of the feature that incorporates the key features for the purpose of preservation; embed a local copy of a server based resource rather than depend on a third party service; supply code or data for the feature with documentation for re-assembling the functionality; record a video of the interaction as it behaves in the published environment for future playback; or, a combination of these.
These guidelines offer alternative ways to manage features that depend on a live server:
16. Captions add important context to non-text features
53. Consider web page designs that pre-load all data when the page loads
63. Supply raw data, documentation for data visualizations
In Owning My Masters (Mastered) by A.D Carson, some images are embedded in the EPUB using a IIIF viewer. As a user changes the view on these images by panning or zooming, the tool communicates with a live server to load new image tiles. It is difficult to preserve this interactivity whether via export or web harvesting. To ensure it is possible to view this image in the preserved copy, Fulcrum links this feature to a Resource page on which there is a download button to retrieve a static copy of the image. Here is an example: This static copy of the image can be easily harvested by a web crawler since the button uses a standard HTML anchor link. Fulcrum also includes this static version of the image in its export package. This ensures viewers can see the full resolution image without needing the IIIF feature to function.
Avoid obscure, dependent, or closed technologies that are harder to support due to the opaqueness of the underlying code, third-party dependencies that may be required for the tool to run, and limited preservation experience with the tool itself. Keep in mind, however, that open tools that are rarely used may be less stable than common proprietary tools that are widely adopted and have more community support.
These guidelines also refer to the use of open formats and technologies:
3. Use existing standards to guide design decisions on publishing platforms
11. Use non-proprietary, broadly supported and adopted open file formats
29. Consider a preservation-specific EPUB in your workflow
51. When embedding video and other media in web-based publications, host the media files local to the website using standard HTML
For publications where some content should not be preserved, consider tagging what can be preserved in a consistent way that can be used by preservation export or harvesting processes to exclude items that should not be preserved. Platforms may want to facilitate this tagging.
These guidelines also concern the inclusion and exclusion of content in the preservation process:
10. Define and document core intellectual components that need to be preserved
20. Represent all core intellectual components of the work in the export package
40. Identify the rights for external web content
55. Consider whether it is ethical/appropriate to preserve social media
65. Ensure irrelevant or private administrative data is removed from data exports
Complex and interactive features of a publication may be most vulnerable to change or loss over time, especially if they also have third-party dependencies. Where there is a layout or interactivity that is important to understanding the work, record a video walk through with the author that shows the original intent so that it can be viewed or even recreated in the future. Include this recording in the preserved version of the publication.
See also:
11. If adding a video to the preservation package, consider the format
20. Represent all core intellectual components of the work in the export package
71. Document and share the platform-level approach to preserving components of a publication
Stanford University Press publishes immersive digital monographs that feature non-linear navigation and innovative design. As part of their preservation strategy, they include a video walkthrough for each publication as documentation. This video serves both as a way to introduce readers to an unfamiliar experience and also as a record in the event that features stop functioning in the future. Here is a video Walkthrough for Stephen Robertson’s Harlem in Disorder.